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Abstract 

We discuss what is an appropriate set of explanatory variables in order to predict the absolute magnitude 
at the maximum of Type la supernovae. In order to have a good prediction, the error for future data, 
which is called the “generalization error,” should be small. We use cross-validation in order to control the 
generalization error and LASSO-type estimator in order to choose the set of variables. This approach can 
be used even in the case that the number of samples is smaller than the number of candidate variables. 

We studied the Berkeley supernova database with our approach. Candidates of the explanatory variables 
include normalized spectral data, variables about lines, and previously proposed flux-ratios, as well as the 
color and light-curve widths. As a result, we confirmed the past understanding about Type la supernova: 
i) The absolute magnitude at maximum depends on the color and light-curve width, ii) The light-curve 
width depends on the strength of Sill. Recent studies have suggested to add more variables in order to 
explain the absolute magnitude. However, our analysis does not support to add any other variables in 
order to have a better generalization error. 
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1. Introduction 

Type la supernovae (SNe la) have been used as “stan¬ 
dard candles” to estimate the distance to galaxies in cos¬ 
mology. Phillips (1993) found a significant correlation be¬ 
tween their absolute magnitude at maximum, M, and de¬ 
cay rate, and proposed that a better distance indicator can 
be obtained by calibrating it. As well as the decay rate, 
the observed color also exhibits a clear correlation with 
M. This is mainly due to the interstellar extinction in 
both their host and our galaxies, while it is proposed that 
there is a variation in the intrinsic color of SNe la at maxi¬ 
mum (Conley et al. 2007; Foley, Kasen 2011). In addition 
to these two, a number of variables have been proposed as 
explanatory variables of M. They are, for example, the 
equivalent widths, velocities, or depths of absorption lines, 
or their ratios (for a review, see Silverman et al. 2012). 

The search for a good set of variables, in other words, 
the “model,” have recently been intensified including ar¬ 
bitrary ratios of the fluxes in spectra. Using the 58 ob¬ 
jects observed by Nearby Supernova Factory, Bailey et al. 
(2009) report that the model with a single ratio of the flux 
at 642 nm to that at 443 nm, hereafter 77.(642 nm/443nm), 
has a smaller residual of M than the classical model with 


the color and decay rate (, or light-curve width). Using 26 
objects observed by the CfA Supernova Program, Blondin 
et al. (2011) confirm the conclusion in Bailey et al. (2009) 
with a slightly different ratio, 77(6630A/4400 A), although 
the improvement of the model has low significance. In ad¬ 
dition, they propose another model with the color and the 
color-corrected flux ratio, 77 c (4610 A/4260A) at t = — 2.5d 
from maximum light. Silverman et al. (2012), using 62 ob¬ 
ject observed by the Berkeley Supernova la Program, re¬ 
port that the best set of variables is the light-curve width, 
color, and 77 c (3780 A/4580 A). On the other hand, their 
analysis did not confirm the results in Bailey et al. (2009) 
and Blondin et al. (2011). Thus, the resulting models of 
each work are not completely consistent, and the model 
for the prediction of M has not been established. 

In previous studies, a linear regression model of M has 
been assumed: 

M b ^ M b ,o + P 1 X 1 +/3 2 X2 4-1 -PlXl, (1) 

where M B is the absolute magnitude in the R-band, which 
has been conventionally used in past studies. M B ^ is a 
constant. The vector, x = (x±, £ 2 , • • •, Xl) t is a set of 
explanatory variables of M B . The elements in x are, for 
example, the color, decay rate (, or light-curve width), and 
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variables about the lines. 0 = (0i,02,---,I3l) T is the vector 
of their coefficients. Suppose that N samples of SNe la are 
available, and the observations are summarized as y ~X/3, 
where y = (M B1 ,M B2 ,--- ,Mbn) T and X = (aq, ■ ■ ■ ,x N ) T . 
The goal of the study is to find an appropriate set of 
variables in x for the prediction of y. We prefer the model 
to have a small generalization error for the prediction of 

y- 

If TV > L, it is possible to estimate the values of all 
elements in 0 with the least-square method. However, 
the risk of over-fitting increases as N/L becomes smaller. 
Furthermore, the least-square method cannot determine 
a unique model when N < L. Such a situation can appear 
when arbitrary flux ratios in spectra are included into X. 
Hence, previous studies included only one or two flux ra¬ 
tios in a model, and search for the best set of the variables 
for the observations. 

Finding an appropriate set of variables to describe M B 
of SNe la is a variable selection problem, which has been 
studied in the field of statistics and machine learning. In 
this paper, we report a result of variable selection ap¬ 
proach applied for M B . We controlled the generaliza¬ 
tion error with a regularization term, whose size is chosen 
via cross-validation, and a subset of the variables are se¬ 
lected from L components by Least Absolute Shrinkage 
and Selection Operator, or the so-called, LASSO method 
(Tibshirani 1996). This method can find the unique solu¬ 
tion even in the case of N < L. In section 2, we describe 
the method. In section 3, we report on the results of our 
experiments. We apply our method to the data provided 
by the Berkeley supernova database. In section 4, we dis¬ 
cuss the implication of our results, and summarize our 
findings. 

2. Method 

2.1. LASSO-type estimation 

Here, we consider a linear regression model, y = Xf3 + e, 
where X is a given real N x L matrix and e is a Gaussian 
noise with E[e\ = 0 and E[ee'\ = a 2 In . Our goal is to 
find an appropriate set of variables from L variables and 
N samples and compute the corresponding coefficients 
of 0. For this sort of estimation problems, Tibshirani 
(1996) proposed a method, Least Absolute Shrinkage and 
Selection Operator, or the so-called, LASSO, for selecting 
the best set of explanatory variables. LASSO provides 
a solution 0 by minimizing the following function which 
includes the £l-norm of 0 as a regularization term 

0\ = &rg min{||y —X/3||| +A||/3||i} , (2) 

P 

where ||/3||i is the £l-norm, defined as ||/3||i = Ei |A|; and 
A is a tunable constant. The estimate 0 includes 0 compo¬ 
nents, that is, variables selection is realized with LASSO- 
type estimation. The number of 0 components increases 
as A becomes larger. 

We apply the LASSO-type estimation in order to select 
an appropriate model to predict M B of SNe la. The data, 
y, is M b and each column of X corresponds to an ob¬ 
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served variables, such as, the color, light-curve width, and 
variables about spectra. Recent projects have provided 
high-quality and uniform samples of SNe la in both pho¬ 
tometric and spectroscopic data. The number of available 
samples, N, is now ~ 100. The number of candidate ex¬ 
planatory variables can be > 10 4 if arbitrary flux ratios 
are included. However, we can expect that the number of 
effective variables is small. In other words, our interest fo¬ 
cuses on a model in which M B is explained not with ~ 10 4 , 
but with only a few variables of x. Exhaustive search 
for every subset of candidate variables is not tractable, 
and the LASSO-type estimation gives us a data-driven 
approach to select the best subset of variables for the data¬ 
set. 

2.2. Cross-validation 


The cost function for the estimation expressed in equa¬ 
tion (2) contains a tunable parameter, A. This parame¬ 
ter controls the weight of the regularization term, which 
has an influence on the generalization error. We choose 
the best A by the cross-validation method. In the AT-fold 
cross-validation, the data is divided into K roughly equal 
sub-samples, yk (k = 1,2,---, K). For each k. the training 
data is defined as all the K — 1 sub-samples except for the 
validation data, yk- The optimization of the model to the 
training data gives 0k,\ at a certain A. The generalization 
error of the model is evaluated with the mean of weighted 
mean square errors (wMSE; E( A)) of the K sub-samples; 


E{ A) 
E k { A) 




k =1 

N 

x i,jPk,\,j 
i=1 


(3) 

(4) 

(5) 


where Mk is the number of the validation data, yk , and 
a k ,i is the measurement error of the *-th element in yk- 

In a very large A regime, the least-square term is large, 
and thereby E( A) also becomes large. In a very small A 
regime, on the other hand, the model can reproduce the 
noise in the data (over-fitting), and thereby have a large 
generalization error, and eventually lead to a large E( A). 
Thus, we can find the minimum value of E( A) at a cer¬ 
tain A. The best model can be considered as the simplest 
model whose E( A) is within one standard error of the min¬ 
imal E( A). This is the so-called “one standard error rule”. 
Models having A smaller than the best one are statistically 
indistinguishable from the over-fitting situation. In this 
paper, we use this rule to select A, and set K = 10. 

Another common variable selection scheme is to use an 
information criterion, such as Akaike information crite¬ 
rion (AIC) or Bayesian information criterion (BIC). We 
employed the regularization term and the cross-validation 
because we expect not only the observation noise e but 
also the measurement errors in X, and we do not have 
a good model selection criterion for this situation. The 
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measurement error of Mb is occasionally quite small, an 
order of 0.01 mag. On the other hand, the error of the 
elements in A can be large. For example, a ratio between 
low fluxes can have a large error. 

2.3. Demonstration of the method 

We performed simple simulations of the LASSO-type 
estimation for the current problem. The vector, /3, was 
set to be a sparse vector, containing only three non-zero 
values in L elements. We set three cases: L = 10 2 , 10 3 , 
and 10 4 . The matrix, X , was set to be a N x L ma¬ 
trix whose elements were random values generated by 
W(0,1), a normal distribution with a mean of 0 and vari¬ 
ance of 1. We set N = 50 in all cases. Then, we calcu¬ 
lated the data vector, A/3, and added noise, y = A/3 + e, 
e~A/'(0,0.01cr 2 / JV ), where a y represents the standard de¬ 
viation of observation noise. Here, we assumed a small er¬ 
ror in y because Mb is occasionally determined with such 
a high precision. We also added noise in the elements of 
A, Xij = x.jj + Cij. eij ^J\f(0,a^), and generated A. We as¬ 
sumed small and large errors in A, that is, ax = 0.01 and 
0.25. We estimated (3 from y and A using the £l-norm 
minimization. The best model and its A were determined 
by cross-validation. The results are shown in figure 1. In 
the case of the small ax, the assumed (3, indicated by the 
red points, are successfully reconstructed in all L cases, 
albeit with a 3-20% systematic bias. In the case of the 
large ax, all non-zero elements in (3 are detected in the 
cases of L = 10 2 and 10 3 , while their coefficients are sig¬ 
nificantly underestimated and weak false signals are also 
seen. In the case of L = 10 4 , the assumed weak signal 
is lost in the reconstruction, and false signals have large 
coefficients. 

This experiment demonstrated two important points 
about the proposed method. First, it can reconstruct the 
original vector even with the case of N < L. Second, even 
with this method, we cannot avoid detecting false signals 
which are coincidentally fit the data in the case of a large 
L. The latter point could have a significant implication 
for using the arbitrary flux ratios in the current problem. 
The number of the flux ratios is more than 17000, while 
the number of samples is < 100. Hence, we should reduce 
the number of columns in A in order to avoid detecting 
the false signals. In this paper, as described in the next 
subsection, we use two kinds of spectra normalized by the 
continuum level and by the total flux. 

LASSO tends to underestimate the coefficients if the 
measurement error of the target variable is not negligible, 
as can be seen in figure 1. Hence, it should be used to 
select the best set of variables. Then, the model of Mb 
can be obtained by a refit to the data with the selected 
variables. In this paper, we focus on variable selection. 

2-4- Sample and variables 

We used the data from SuperNova DataBase provided 
by Berkeley Supernova la program 1 . Our sample selec¬ 
tion was based on the criteria in Silverman et al. (2012): 

1 (http://hercules.berkeley.edu/database/index_public.html) 


The redshift of the sample ranged from 0.01 to 0.1. We 
used the spectral data from 3500 to 8500 A. The rest- 
frame days relative to the maximum is ranged from —5 
to +5 d. We used the spectrum having the smallest value 
of the rest-frame days relative to the maximum for each 
object in the case that multiple spectra were available. 
We only used samples having the color parameter, c, less 
than 0.5. We found two Type lax objects, SN 2003gq and 
2005hk in the sample, and excluded them (Foley et al. 
2013). As a result, we found 78 objects in the database. 
The available data contains, for example, the redshift, z, 
light-curve width, x\, color, c, apparent magnitude, tub, 
and spectra. As mentioned in section 1, it is believed that 
x\ and c are important explanatory variables for Mb- We 
calculated Mb from ms and z by adopting the standard 
A cold dark matter cosmology with f l m = 0.27, Ha = 0.73, 
and w = — 1. 

The calibration of the spectral data was performed in 
the standard manner: The flux was corrected for the 
reddening in our galaxy using E(B — V). We used the 
E(B — V) values obtained from the supernova database, 
which refers to Schlegel et al. (1998) and Peek, Graves 
(2010). The red-shift correction was performed on the 
wavelength. Then, the spectra were divided into 134 bins 
which were equally spaced in the logarithmic velocity 
scale between 3500 and 8500 A, as in Silverman et al. 
(2012). We calculated the arbitrary flux ratios using 
the binned spectra. The number of the ratios is then 
134 x 133= 17822. 

Including arbitrary flux ratios may provide an exhaus¬ 
tive search for an appropriate set of explanatory variables 
of Mb- However, the number of candidate variables is so 
large that false signals can be detected, as demonstrated 
in the last subsection. Hence, we need to consider other 
sets of candidate variables which are related to the flux 
ratios, but have much smaller dimension. In this paper, 
we use two kinds of normalized spectra. 

First, the variables of the most interest are the flux 
ratios of the line areas to the continuum level. Indeed, 
most of previously proposed ratios are such variables: 
7^.(6420/4430) = Fell/continuum (Bailey et al. 2009), 
7^.(6630/4400) = Fell/continuum, 7^.(6420/5290) = contin- 
uum/Sn, and 7^.(4610/4260)= continuum/Feii (Blondin 
et al. 2011). They can be substituted by the spectra 
normalized by the continuum level. The continuum level 
was approximated by a cubic smoothing spline fitted to 
masked spectra. The mask is depicted in figure 2 with the 
binned spectra of a typical sample, SN 2006et. The data 
points indicated by the filled circles were used to calcu¬ 
late the continuum curve. In addition, the points with the 
maximum flux in each shaded area were also used. The 
several examples of the continuum-normalized spectra are 
shown in the lower panel of figure 3. We call the set of 
the continuum-normalized spectra as / cnt . 

Second, the local colors in the continuum which may 
have independent information of the broadband colors are 
also variables of interest. They can be substituted by the 
spectra normalized by the total flux between 3500 and 
8500 A. We call the set of this total flux normalized spec- 
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Fig. 1. Simulations of the LASSO-type estimation. The red and black points represent the assumed and estimated values, respec¬ 
tively. The number of samples is 50. The numbers of explanatory variables are L = 10 2 (left), 10 3 (middle), and 10 4 (right), as 
shown in each panel. The upper and lower panels depict the cases for small and large errors assumed in X. 



4000 5000 6000 7000 8000 

Wavelength (A) 


Fig. 2. Mask for calculating the continuum level. The spec¬ 
trum of SN 2006et is also plotted as a reference. For detail, 
see the text. 

tra as /tot- The intrinsic color can be bluer than the ob¬ 
served one because of the interstellar reddening effect in 
the host galaxy. In previous studies, the color correction 
for this effect has been performed by assuming that all 
SNe la have the same intrinsic color. We also performed 
this correction using the SNe la color-law for the SALT2 
data (Guy et al. 2007). The color-corrected spectra is, 
then normalized by the total flux, and named / t c ot . 


We include those two kinds of normalized spectra, 
(/cnt,/tot) or (/cnt,/tot) as the candidates, instead of the 
arbitrary flux ratios. In addition, we use the flux in the 
logarithmic scale in order to include the information of 
arbitrary flux ratios. We can identify a good flux-ratio 
parameter by searching for the two fluxes having the sim¬ 
ilar coefficients with the opposite sign: c-log(/i// 2 ) = 
cdog(/i) — cdog(/ 2 ). Figure 3 shows examples of the spec¬ 
tra that are normalized by the total flux (/tot, the upper 
panel) and by the continuum (/ cn t, the lower panel). 

As well as x- v , c, / cn t, /tot, and / t c ot , we include 
previously proposed flux-ratios, 72 into the model as 
candidate explanatory variables for Mb- We con¬ 
sider six flux-ratios proposed in Bailey et al. (2009), 
Blondin et al. 2011, and Silverman et al. (2012), that 
is, 72 = {72(3780/4580), 72(4610/4260), 72(5690/5360), 
72.(6420/4430), 72(6420/5290), 72(6630/4400)}. The flux 
ratios which are calculated from the color-corrected spec¬ 
tra, / t c ot , are called as 72 c . 

Silverman et al. (2012) presents tables of measured val¬ 
ues of the lines: Can H&K and near-infrared triplet, Sill 
4000, 5972, and 6355 A, Mgn, Fen, Sii “W,” and Oi 
triplet. We can use pEW, Delta pEW (i.e., the measured 
pEW subtracted by the template evolution), velocity ( v ), 
line depth (a), and FWHM for the explanatory variables. 
We note that those line variables are incomplete for our 
sample. Hence, the number of samples reduces when those 
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Fig. 3. Examples of the spectra in our sample. Upper panel: 
the spectra normalized by the total flux between 3500-8500A. 
Lower panel: the spectra normalized by the continuum. The 
spectra of three examples, SN 2005dm, SN 2000dk, and 
SN 2008ar, are shown in the both panels. 


line variables are used as the candidate variables, and we 
used them element by element. We represent a set of the 
line values as C. For example, £sili 4000 means those vari¬ 
ables of Sill 4000A. 

For the optimization of the model to the data, we 
used the glmnet package for R. 2 The selection of A 
was performed using the function for the cross-validation, 
cv.glmnet, adopting the one-standard error rule. The 
cross-validation is based on random sub-sampling and the 
selected variables might be influenced by it. We performed 
10 4 experiments for each model, and calculated the selec¬ 
tion probability, p, of each variable. In this paper, we 
discuss selected variables only with p > 0.3. Each column 
in X was normalized to have zero mean and unit variance, 
by a linear scaling, x F = ( Xij — Xj)/<jj , where Xj and <jj 


53 44 34 30 22 19 12 9 8 6 2 2 2 2 1 



log(Lambda) 


Fig. 4. Cross-validation curve for Model 1. The lower and 
upper horizontal axes denote A and the number of non-zero 
elements, respectively. The vertical axis denotes wMSE. The 
left dotted line indicates A having the minimal wMSE. The 
right dotted line indicates the best model under the one-stan¬ 
dard error rule. 


are the mean and standard deviation of the j -th column. 

We need this normalization to compare the coefficients, 

(3, of variables having different units. The list of ob¬ 
jects and explanatory variables used in this paper 
is available as an online supplement material. 

3. Results 

First, we choose the light curve width (ar), color (c), 
spectra normalized by the total flux (/tot), those by the 
continuum (/ cn t), and previously proposed flux-ratios (TV) 
as the candidate explanatory variables, and Mg as the 
target variable. We call this complete model as Model 1. 

It can be rewritten as: 

Mb = Mb, o + Pic + ^ 2*1 

+ A*/tot(3512) + /3 4 /tot(3534) + • • • + /Wtot(8472) 

+ /Went (3512) + /Wcnt(3534) + • • • + /Wc„t(8472) 
+ /3 27 i^(3780/4580) + /? 272 ^(4610/4260) 

+ AW(5690/5360) + WK(6420/4430) 

+ /3 27 5 ^(6420/5290) + /?2 76 ft(6630/4400) + e. (6) 

Using LASSO-type method for 78 samples of Mb, we 
choose the appropriate set of explanatory variables from 
276 candidates and estimate coefficients vector f3. The 
tuning parameter, A, is determined by cross-validation. 
Figure 4 shows the cross-validation curve for Model 1. In 
this figure, we can confirm that wMSE take the minimum 
value in a given range of A, and the best model is properly 
determined by the one-standard-error rule. 

Table 1 lists all models and results presented in this 
paper. From Model 1, the classical variables, that is, c 
and X\ are selected, /tot(6373) is also selected, having a 
coefficient even larger than that of X\ in the absolute val- 
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Table 1. Models and Results 


Model 

Target variable 

Explanatory variables 

Non-zero elements 

coefficients 

P 


V (AO 

X (L) 


P 


1 

M b (78) 

Xl,c, ftot, fcnt, H (276) 

c 

0.376 

1.00 




/tot (6373) 

0.100 

1.00 




X\ 

-0.050 

0.98 




fcnt (6084) 

-0.034 

0.98 




/cnt(6289) 

-0.045 

0.95 




/cnt(6631) 

-0.061 

0.80 




7^.(3780/4580) 

-0.050 

0.74 




/tot (3752) 

0.063 

0.73 

2 

M b -P ic (78) 

Xi, ftot, fcnt, K (275) 

Xl 

-0.020 

0.99 

3 

M b -Pic (78) 

Zl,/t C ot,/cnt,^ C (275) 

Xi 

-0.014 

0.85 

4a 

xi (76) 

C,/t C ot’/cnt,7?. C ,£siII4000 (280) 

DpEW SiII4000 

-0.455 

1.00 




/cnt(5770) 

0.518 

1.00 




/cnt(3982) 

-0.262 

1.00 




/cnt(7038) 

-0.485 

0.96 




/tot (4988) 

-0.238 

0.77 




/cnt(6084) 

0.281 

0.62 

4b 

xi (74) 

c,f t c ot ,f C nt,n c ,£ su « W" (280) 

/cnt(5770) 

1.034 

1.00 




/cnt(6084) 

0.440 

1.00 




/tot (6458) 

0.300 

1.00 




/cnt(3982) 

0.041 

1.00 




/cnt(7179) 

0.289 

0.99 




/cnt(6458) 

-0.236 

0.94 




/cnt(6331) 

0.612 

0.92 

5 

Ms-Wic + fox!) (78) 

/tot , fcnt - (273) 





ues. Figure 5 indicates the non-zero elements of /tot and 
fcnt • As can be seen in this figure, /tot (6373), indicated 
by the red vertical line, lies in the continuum area. Hence, 
it may be related to the local color which could have spe¬ 
cific information against the broad-band color, c. As can 
be seen in figure 4, some fluxes in line regions are also 
selected: / cn t(6084) and / cn t(6289) are probably related 
to the continuum-normalized depths of Siii(6355). In ad¬ 
dition, 7^(3780/4580) and /tot(3752) are probably relate 
to Can H&K. /cnt(6631) corresponds to the continuum 
flux of the continuum-normalized spectra, which suggests 
a false signal. 

We confirmed that the result was consistent even when 
we included the line variables, C, and all the line variables 
have zero coefficients in any elements. The lack of the 
dependency on C is common in other subsequent models, 
except for Model 4 (see below). Hence, we present the 
models only without C in this paper. 

In general, when some explanatory variables are cor¬ 
related, LASSO could select a few of them. In the 
present case, the variables are measurements having non- 
negligible errors. Among correlated variables, a variable 
having a smaller error results in a smaller generalization 
error of Mb- Hence, in the case of large A, the vari¬ 
able having the smallest error is first selected. In the 
case of smaller A, the other correlated variables are se¬ 
lected. It is possible that a high correlation between 
c and /tot (6373) would cause the non-zero coefficient of 
/tot (6373) in Model I. If this is the case, it is unclear that 



Fig. 5. Non-zero elements of spectral data in Model 1. The 
red and blue lines indicate the non-zero elements in the 
total-flux-normalized and continuum-normalized spectra, re¬ 
spectively. The spectrum of SN 2006et is also plotted as a 
reference. The solid and dashed lines and red points are un¬ 
binned spectra, estimated continuum level, and binned spec¬ 
tra, respectively. 
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Fig. 6. Same as figure 5, but for Models 4a and 4b. 

/tot (6373) is a significant variable that has independent 
information of c. We performed a regression analysis with 
Mb = P\C+ Mb,q, and corrected for the effect of c in Mb 
by using Mb — (he as the target. We call this complete 
model as Model 2. The number of samples is the same 
as that of Model 1, 78, while the number of candidate 
explanatory variables is 275, one smaller than in Model 1 
because c is omitted. As can be seen in table 1, Xi is 
the only variable having a non-zero coefficient. A simi¬ 
lar result was obtained for Model 3, where we used the 
color-corrected spectral data, / t c ot and 1Z C instead of f to t 
and 1Z. Hence, the lack of /tot (6373) is independent of the 
color correction. These results suggest that the high cor¬ 
relation between c and /tot (6373) causes the apparently 
high coefficient of /tot (6373) in Model 1. 

As well as / to t(6373), Model 1 indicates possible depen¬ 
dency of Mb on the variables related to the line areas, 
that is, /cnt(6084) and f cn t (6289). It has been reported 
that Xi depends on the line strength, for example, the 
EW of Sill 4000 (e.g. Hachinger et al. 2006; Arsenijevic 
et al. 2008). It is possible that the line dependency in 
Model 1 may be due to a high correlation between X\ 
and the line strength of Sin. For examining this possi¬ 
bility, we considered Model 4 in which the target is x\. 
Model 4a includes pEW, DpEW, v, a, FWHM of Sill 
4000, as well as c, /tot. /cnt, and TZ C as the candidate ex¬ 
planatory variables of x\. This is the unique case where 


the coefficients of C have non-zero values in the analysis 
presented in this paper. Model 4b is for Sil “W”, as a 
typical case of the other lines. The results are shown in 
table 1 and figure 6. In Model 4a, DpEW of Si II 4000 has 
a non-zero coefficient. The importance of this line is also 
confirmed by the selection of / cnt (3982) both in Models 4a 
and b. In addition to Sin 4000, / cn t(5770) and / cn t(6084) 
have non-zero coefficients in both models, corresponding 
to Sill 5972 and 6355. There are several other non-zero 
elements in Model 4a, although they are not confirmed 
in Model 4b. The dependence of x\ on Sin supports the 
previous studies about x±. 

Finally, we employed Model 5, in which the target is 
Mb corrected for c and Xi, that is, Mb — (/3ic + fcxi), 
where /3i and fa are determined by a regression analysis. 
The candidate explanatory variables of Model 5 are /tot, 
/ cnt , and TZ C . However, any of them is not selected. The 
result suggests that the high correlation between X\ and 
the Si II line strength results in the apparent dependency 
of Mb on the line depths in Model 1. Hence, the best set 
of explanatory variables is (c,xi) in our analysis. We re-fit 
the data with these variables, and obtained the following 
model: 

M b = —19.26(±0.03) + 2.75(±0.17)c - 0.10(±0.02)a(i7) 

Note that these values are calculated not from normalized 
values of the variables as in table 1, but from raw values. 

4. Discussion and Conclusion 

Our analysis confirms the classical understanding of 
SNe la, that is, i) the light-curve width (xi) and color 
(c) are the important explanatory variables of the abso¬ 
lute magnitude at maximum ( Mb ) (Phillips 1993), and 
ii) the light-curve width correlates with the strength (EW 
or depth) of Sin (e.g. Hachinger et al. 2006; Arsenijevic 
et al. 2008). Furthermore, our variable selection approach 
using the LASSO-type estimation does not support to add 
any other variables, such as the normalized spectra (/tot, 
/tot, /cnt), previously proposed flux ratios (7Z) : and line 
measurements (£), in order to have a better generaliza¬ 
tion error of Mb- We confirmed that the above conclu¬ 
sion is robust to small changes in our analysis: using the 
flux in logarithmic or linear scale, excluding or including 
two Type lax objects, and normalizing each column in X 
or not. Our analysis implies that over-fitting can cause 
partly inconsistent results seen in previous studies which 
used the arbitrary flux ratios (Bailey et al. 2009; Blondin 
et al. 2011; Silverman et al. 2012). 

Our conclusion is inconsistent with that reported by 
Silverman et al. (2012) although the both samples are 
obtained from the Berkeley supernova database with the 
common data selection. The model selection method is 
also common: Following Blondin et al. (2011), Silverman 
et al. (2012) performed 10-fold cross-validation, and cal¬ 
culated the mean and standard error of 10 weighted root- 
mean squares (wRMS) of residuals. They measured the 
significance of the improvement of the model with the 
mean wRMS and its standard error. They found that 
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Fig. 7. 1Z C (3780/4580) of our sample against c 

the model with c, x±, and 1Z C (3780/4580) improves the 
prediction error by a level of 1.7cr compared with the clas¬ 
sical one with c and x\. This flux ratio is also detected as 
an explanatory variable in our Model 1, while it is not 
in the other models of Mb (see table 1). The wave¬ 
length of 3780 A corresponds to the mid-point of Can 
H&K, and 4580 A to the border between the Mgn and 
Fell complexes. Figure 7 shows 77 c (3780/4580) of our 
sample against c. Those two variables exhibit a weak an¬ 
ticorrelation, as can be seen in this figure. Our result that 
1Z C (3780/4580) is detected in Model 1 and not in Models 2 
and 3 can be explained by this anticorrelation. 

In principle, the spectral data, that is, the values of 
the flux density and their ratios, could be important ex¬ 
planatory variables of Mb- The interstellar extinction is 
definitely the most important variable. As well as the 
color parameter, c, the continuum flux of the total-flux 
normalized spectra, that is, ftot could be an indicator of 
the extinction. Indeed, /tot (6373) has a relatively large 
coefficient in Model 1. Our analysis suggests that c is a 
better variable rather than /tot (6373) and other normal¬ 
ized fluxes. This is probably because of the uncertainty 
of measurements and observation epochs. In general, the 
flux calibration of spectra has larger errors than the dif¬ 
ferential photometry. Moreover, the observation epochs 
of spectra are different from one object to another in our 
sample. The color parameter, c, is based on differential 
photometry, and corrected to the color at maximum. A 
similar situation is also expected in the light-curve width, 
x\. Mazzali et al. (2001) propose that x\, or decline rate, 
so-called, Amis is a function of the amount of 56 Ni pro¬ 
duced in SNe. The absorption line variables are also pos¬ 
sible indicators of the amount of synthesized elements in 
SNe. Indeed, the correlation between x\ and the strength 
of the Sin lines was confirmed in Model 4. Our method 
selected not the Sill strength, but X\ probably because 
of the small measurement error in X\ for the amount of 
synthesized elements. In our analysis, the best set of the 
explanatory variables is c and x\, while it is trivial that 
our result does not imply a physical causal relationship 
between Mb and those two variables. 
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It is possible that, in future, the increasing number of 
samples revises the model having a better generalization 
error by finding additional or alternative explanatory vari¬ 
ables compared with the model in this paper. It may also 
be meaningful to add variables which were not used in 
this paper, such as those about host galaxies of SNe la 
(Sullivan et al. 2010; Pan et al. 2015). In any of these 
cases, our proposed method offers a framework for finding 
an appropriate set of explanatory variables of Mb even in 
the case that the number of samples is smaller than the 
number of variables. A possible extension of the model 
may be to include the measurement errors of explanatory 
variables. As can be seen in equation (2), our method 
does not include the errors, while errors are expected to 
be large in several variables, for example, flux ratios of 
low fluxes. Bailey et al. 2009, proposing the flux ratio, 
77(642 nm/443nm), as a good explanatory variable of Mg, 
claimed that the spectral slope need to be calibrated with 
very small errors for their model with the flux ratio. The 
instrument that they used was SNIFS (SuperNova Integral 
Field Spectrograph), which was developed to perform flux- 
calibration with a high accuracy (Aldering et al. 2002). 
On the other hand, the instruments which were used in 
Blondin et al. (2011) and Silverman et al. (2012) were 
standard slit spectrographs. It is possible that the lack 
of detection of 77.(642 nm/443nm) in our analysis is due 
to large errors of the flux ratios in our data sample from 
Silverman et al. (2012). A better model including the er¬ 
rors might be provided by a Bayesian approach in which 
the error is included into the model as prior probability 
distributions. 
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